Sync master with upstream release b6374 #240

jan-service-account · 2025-09-04T00:32:27Z

Updates dev branch with latest release (b6374) from ggml-org/llama.cpp

Signed-off-by: noemotiovon <[email protected]>

CANN currently does not support kernels larger than 255. This change disables such cases.

This commit adds a curl script to the model-conversion examples which is currently missing. This script is required for the running the embedding server targets to test llama-server embeddings functionality.

* ggml-cpu : optimize rvv ggml_vec_dot_f32 * ggml-cpu : optimize 128-bit rvv ggml_vec_dot_q4_K_q8_K * ggml-cpu : fix riscv arch flags * ggml-cpu : add more rvv ops * ggml-cpu : optimize rvv ggml_vec_dot_q4_K_q8_K * ggml-cpu : optimize rvv ggml_vec_dot_q6_K_q8_K * ggml-cpu : minor rvv adjustments * ggml-cpu : fix riscv include

…org#15765) * model-conversion : remove hardcoded /bin/bash shebangs [no ci] This commit updates the bash scripts to use env instead of using hardcoded /bin/bash in the shebang line. The motivation for this is that some systems may have bash installed in a different location, and using /usr/bin/env bash ensures that the script will use the first bash interpreter found in the user's PATH, making the scripts more portable across different environments. * model-conversion : rename script to .py [no ci] This commit renames run-casual-gen-embeddings-org.sh to run-casual-gen-embeddings-org.py to reflect its Python nature.

This commit fixes the model type for the Gemma 270M model in llama_model.cpp which should be LLM_TYPE_270M. I incorrectly added this previously as LLM_TYPE_537M which was wrong. The motivation for this is that it causes the model to not be identified properly when using tools like llama-bench. For example: ```console $ ./build/bin/llama-bench -m models/gemma-3-270m-Q8_0.gguf | model | size | ... | ------------------------------ | ---------: | ... | gemma3 ?B Q8_0 | 271.81 MiB | ... | gemma3 ?B Q8_0 | 271.81 MiB | ... ``` With the changes in this commit the output will be: ```console $ ./build/bin/llama-bench -m models/gemma-3-270m-Q8_0.gguf | model | size | ... | ------------------------------ | ---------: | ... | gemma3 270M Q8_0 | 271.81 MiB | ... | gemma3 270M Q8_0 | 271.81 MiB | ... ```

ggml-ci

This commit addresses type errors reported by pyright in the model conversion scripts.

…-6% perf E2E (ggml-org#15715) * Add fastdiv, use it in modulo and use modulo in rms_norm_f32 Fastdiv is much faster way to do integer division, which was identified as bottleneck in rms_norm_f32 * Support more `block_size` values in `rms_norm_f32` This makes us more flexible in selecting the optimal threads w.r.t paralellizing across a col vs. launch-overheads of threads and mio throttles * Update ggml/src/ggml-cuda/common.cuh Co-authored-by: Johannes Gäßler <[email protected]> * Replace modulo with fastmodulo in `rms_norm_f32` * Use `BinPackArguments=true` for formating function calls Will file a separate PR to adjust .clang-format file * Update ggml/src/ggml-cuda/common.cuh Co-authored-by: Johannes Gäßler <[email protected]> * Use uint3 for both `fastdiv` and `fastmodulo` The compiler seems to reliably optimize away the unused .z component in the fastdiv use-case, see https://godbolt.org/z/rx8KPrKr3 * More constrained type declarations Co-authored-by: Johannes Gäßler <[email protected]> * Rename fastdiv and fastmodulo variables to shared variable name As suggest by JohannesGaessler, this increases clarity of the intended use * Pack fastdiv/fastmodulo constants into uint2/uint3 objects By packing constants to be used together into a struct, we are less likely to make errors. * Rename function parameter of fastmodulo `modulo_consts` is more fitting/descriptive --------- Co-authored-by: Johannes Gäßler <[email protected]>

…5666) * vulkan : update ggml_vk_instance_validation_ext_available This commit updates ggml_vk_instance_validation_ext_available() to check for VK_EXT_validation_features instead of VK_KHR_portability_enumeration. Based on how the returned boolean is used later in the code (to enable both the validation layer and the VK_EXT_validation_features extension), it appears the function may have been intended to check for the validation layer features extension. * remove try/catch This was a left over from a previous iteration where I was explicitly quering for a specific validation layer first, which would throw. * update warning message about validation layers

ggml-org#15724) * vulkan: don't use std::string in load_shaders, to improve compile time * keep the string version for those calls that use it

noemotiovon and others added 14 commits September 3, 2025 10:43

CANN: Fix type float_t to float (ggml-org#15736)

8a2234e

Signed-off-by: noemotiovon <[email protected]>

CANN: Mask unsupported TRANSPOSE_1D operator (ggml-org#15733)

f6da8cb

CANN currently does not support kernels larger than 255. This change disables such cases.

model-conversion : add missing curl script [no ci] (ggml-org#15761)

8c3fdf4

This commit adds a curl script to the model-conversion examples which is currently missing. This script is required for the running the embedding server targets to test llama-server embeddings functionality.

CANN: Add RoPE contiguous check for 310I DUP device (ggml-org#15735)

5eae934

sampling : optimize dist sampler (ggml-org#15704)

cdedb70

ggml-ci

model-conversion : fix pyright errors (ggml-org#15770)

407c237

This commit addresses type errors reported by pyright in the model conversion scripts.

ggml vulkan: add hardsigmoid and hardswish operations (ggml-org#15762)

0014fb4

vulkan: don't use std::string in load_shaders, to improve compile time (

0fce7a1

ggml-org#15724) * vulkan: don't use std::string in load_shaders, to improve compile time * keep the string version for those calls that use it

vulkan: fix mmv subgroup16 selection (ggml-org#15775)

dff7551

jan-service-account merged commit 185a645 into dev Sep 4, 2025
13 checks passed

jan-service-account deleted the update-dev-from-master-2025-09-04-00-32 branch September 4, 2025 00:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b6374 #240

Sync master with upstream release b6374 #240

Uh oh!

jan-service-account commented Sep 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Sync master with upstream release b6374 #240

Sync master with upstream release b6374 #240

Uh oh!

Conversation

jan-service-account commented Sep 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants